-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
topn with granularity regression fixes #17565
topn with granularity regression fixes #17565
Conversation
changes: * fix issue where topN with query granularity other than ALL would use the heap algorithm when it was actual able to use the pooled algorithm, and incorrectly used the pool algorithm in cases where it must use the heap algorithm, a regression from apache#16533 * fix issue where topN with query granularity other than ALL could incorrectly process values in the wrong time bucket, another regression from apache#16533
@@ -275,7 +275,7 @@ private static boolean canUsePooledAlgorithm( | |||
final int numBytesToWorkWith = resultsBuf.capacity(); | |||
final int numValuesPerPass = numBytesPerRecord > 0 ? numBytesToWorkWith / numBytesPerRecord : cardinality; | |||
|
|||
return numValuesPerPass <= cardinality; | |||
return numValuesPerPass >= cardinality; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comparision change caught my eye...the new cmp seem to be more in line with the intention - but it seems to me that
depending on some conditions here ; this cardinality
's value might be -1
in some cases
is that ok to answer true
if cardinality
is -1
?
note: if cardinality == -1
; then both the old and the new logic returns true
- is that ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea this is fair, it is not really a problem in practice because currently the column capabilities should not report as dictionary encoded if the cardinality is -1, so we return false earlier in the method before we get here, but this seems worth explicitly checking for just in case.
@@ -245,6 +245,11 @@ private static boolean canUsePooledAlgorithm( | |||
final int numBytesPerRecord | |||
) | |||
{ | |||
if (cardinality < 0) { | |||
// unknown cardinality doesn't work with the pooled algorith which requires an exact count of dictionary ids |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
algorithm (spelling)
@@ -391,7 +391,7 @@ public boolean hasNext() | |||
if (delegate != null && delegate.hasNext()) { | |||
return true; | |||
} else { | |||
if (!cursor.isDone() && granularizer.currentOffsetWithinBucket()) { | |||
if (granularizer.currentOffsetWithinBucket()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was the cursor.isDone()
here removed simply because it's redundant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, i noticed granularizer.currentOffsetWithinBucket also checks isDone so it seemed nicer this way, this didn't cause any issues, just noticed while i was looking over stuff
aggregator.init(resultsBuffer, position); | ||
aggregator.aggregate(resultsBuffer, position); | ||
positionToAllocate += aggregatorSize; | ||
if (granularizer.currentOffsetWithinBucket()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was the granularizer.currentOffsetWithinBucket()
check added here (& the other prototypes) to enable skipping of empty granular buckets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, this was a bug, i did it as an if statement wrapping the loop so we wouldn't need to check currentOffsetWithinBucket()
twice per loop since the granularizer advance methods also call currentOffsetWithinBucket
…when multi-passes is required even wihen query granularity is not all
@@ -19,8 +19,8 @@ | |||
} | |||
], | |||
"context": { | |||
"useCache": "true", | |||
"populateCache": "true", | |||
"useCache": "false", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was trying to see if could allow the tests to pass without setting the new context parameter, but it didn't seem to help, so changed to use the new context parameter.
@@ -89,6 +89,7 @@ public class QueryContexts | |||
public static final String UNCOVERED_INTERVALS_LIMIT_KEY = "uncoveredIntervalsLimit"; | |||
public static final String MIN_TOP_N_THRESHOLD = "minTopNThreshold"; | |||
public static final String CATALOG_VALIDATION_ENABLED = "catalogValidationEnabled"; | |||
public static final String TOPN_USE_MULTI_PASS_POOLED_QUERY_GRANULARITY = "topNuseMultiPassPooledQueryGranularity"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a javadoc comment here explaining what this is for, and linking to this PR (or a related issue). It's helpful to have that kind of thing for any undocumented parameter.
Since @gianm comments were non blocking, going ahead with merge since this blocks 31.0.1 release. |
@@ -112,12 +115,14 @@ public static CursorGranularizer create( | |||
|
|||
private CursorGranularizer( | |||
Cursor cursor, | |||
Granularity granularity, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@clintropolis Do we need this field ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* topn with granularity regression fixes changes: * fix issue where topN with query granularity other than ALL would use the heap algorithm when it was actual able to use the pooled algorithm, and incorrectly used the pool algorithm in cases where it must use the heap algorithm, a regression from apache#16533 * fix issue where topN with query granularity other than ALL could incorrectly process values in the wrong time bucket, another regression from apache#16533 * move defensive check outside of loop * more test * extra layer of safety * move check outside of loop * fix spelling * add query context parameter to allow using pooled algorithm for topN when multi-passes is required even wihen query granularity is not all * add comment, revert IT context changes and add new context flag
* topn with granularity regression fixes (#17565) * topn with granularity regression fixes changes: * fix issue where topN with query granularity other than ALL would use the heap algorithm when it was actual able to use the pooled algorithm, and incorrectly used the pool algorithm in cases where it must use the heap algorithm, a regression from #16533 * fix issue where topN with query granularity other than ALL could incorrectly process values in the wrong time bucket, another regression from #16533 * move defensive check outside of loop * more test * extra layer of safety * move check outside of loop * fix spelling * add query context parameter to allow using pooled algorithm for topN when multi-passes is required even wihen query granularity is not all * add comment, revert IT context changes and add new context flag * remove unused
Description
changes:
This PR has: